perm filename ITT.5[ITT,WD] blob
sn#202779 filedate 1976-02-14 generic text, type T, neo UTF8
5. Computational Complexity Theory
The analysis of algorithms and theory of computational complexity seek
to establish upper and lower lounds on the minimum amount of computation
required by various problems, and to make these bounds as tight as possible.
Computational complexity also establishes coarse equivalence classes of
computational problems.
To date there are many more upper bounds on computation than there are
lower bounds. Upper bounds are more easily established because any algorithm
which accomplishes a given function establishes an upper bound on the minimum
computation required. Fortunately, greater emphasis is now being placed on the
search for lower bounds, and these are what is needed to ensure that
cryptanalysis is computationally infeasible. As work progresses in this area of
computer science it may unwittingly provide the basis for provably secure
cryptosystems. Conversely, if part of the security of a cryptosystem is based
on the assumed difficulty of a widely studied problem, such as the traveling
salesperson problem, it is likely that any error will be found by a
disinterested third party, not one of our opponents. Even if an opponent were to
find such a flaw in our system, the incentive to publish the solution to a
famous problem might outweigh the advantage of being able to read our mail. We
thus feel that cryptosystems which make use of generally believed lower bounds
on computation to add to their security are of interest even though the theory
of computational complexity is still in its infancy.
We will first review several results in computational complexity to
illustrate several concepts. The sorting problem is one of the few problems
which currently have tight lower bounds on computation. And, although it does
not appear to be useful in cryptography, it illustrates several points of
interest. The problem is, given a randomly ordered sequence of n different
numbers {X1, X2, ... Xn}, sort them into size place X1'<X2'< ... <Xn'. The first
problem in establishing the complexity of this operation is to decide on a
measure for computation. On a computer, there will be two primary operations:
comparing two numbers in the list to see which is larger, and then rearranging
the list on the basis of these comparisons. While the time it takes to
rearrange the list is not negligible in practice, now for the sake of simplicity
we will only count the number of comparison operations required in defining the
complexity of the sorting problem.
As is well known, each comparison yields at most one bit of information
and there are log2(n!) ~ nlog2n bits of uncertainty comcerning the initial the
initial ordering. Therefore the complexity K of the sorting problem obeys K ≥
log2(n!) = n log2n.
We have brushed over an important question. Does complexity refer to a
worst case, average, or some other situation? Using the source coding theorem
[] we can show that () can be interpreted as bound on both the worst case and,
somewhat more strongly, on the average computation (number of comparisons). But
even average computation is too weak for cryptographic purposes. If
cryptanalysis requires 10↑20 operations on the average, but this is due to
10↑-10 of the cryptograms requiring 10↑30 operations and the remaining
cryptograms requiring no time to break, we have a very weak cipher. Rather we
need results more closely related to the idea of "typicality" in information
theory.
Fortunately, use of the Kraft inequality [] allows us to show that for
any sorting algorithm nα, the fraction of orderings requiring at most
(log2(n!))-α comparisons, obeys the bound nα≤2↑-α. For example, if n=64
log2(n!)=296 bits and the fraction of orderings requiring 280 comparisons or
less is 1.5*10↑-5. The fraction requiring 200 comparisons or less is 1.3*10↑-29.
Having derived a lower bound on computation, we might next wonder about
its tightness. There are algorithms[] which require only n(log2n+1)
comparisons, even in the worst case. However, because they require a large
number of memory accesses, they are not as useful as some other algorithms []
which takes about 2nlog2n comparisons on the average and n↑2 in the worst case.
This points to a general problem: if one type of operation is not counted in
evaluating complexity, a low complexity algorithm will often use too many of
these operations to be practical value. If, for example, only multiplications
and divisions are counted, a "low complexity" algorithm will do these operations
by repeated addition or subtraction.
In using computational complexity to certify a cryptosystem these
problems are largely absent. We do not need tight lower bounds on cryptanalytic
computation. Rather, any sufficiently large lower bound will do. If we have
not counted one type of operation, that only adds to the cryptanalyst's task
over and above our lower bound.
However, if we neglect to many real costs in measuring complexity,
sufficiently large lower bounds may fail to exist. For example, if memory is
not included in the complexity measure, one way functions cannot exist because
table lookup can always be used for f↑-1(.). Since this requires enormous
precomputation, the existence of quasi oneway functions might still be proved.
Unfortunately, most of the currently known results in computational
complexity are not applicable to cryptography. For example, the n logn
complexity of sorting is too small to be useful. Similarly, although evaluation
of an nth degree polynomial is known to require n multiplies and n additions in
general[], this is also too small for cryptographic purposes.
Let us therefore jump from these easily computed functions to
uncomputable functions. These are functions for which no algorithm will always
give the correct answer. For example, although it is obviously a well defined
function, it is impossible to compute the function from the set of ALGOL
programs to the set {HALTS, RUNS FOREVER}, in which f(program) = HALTS if and
only if the program eventually reaches the STOP statement []. We are
considering programs which include all input data as part of the program and
which can address an unlimited memory.
One cannot merely let the program run until it stops to get the answer.
If after a trillion instructions the program is still running what is our
answer? In fact, Blum[] hs shown that the truncated halting probllem of
deciding whether a program stops within N steps will often require at least N
steps to decide.
Unfortunately, uncomputable functions are as inapplicable to
cryptography as are the operations such as sorting whose complexity is easily
bounded. This is because all cryptanalytic problems belong to a class known as
NP (for nondeterministic polynomial). As shown below, an NP problem is
computable by definition.
A function is said to belong to the complexity class P (for
deterministic polynomial) if it can be computed by a deterministic universal
computer in a time which is always bounded above by some polynomial function of
the length of its input. One might think of these as the class of easily
computed functions, but it is moreaccurate to say that a function not in this
class must be hard to compute for at least some inputs. There are problems,
such as the truncated halting problem, which are clearly not in the class P.
There is an interesting class of problems which often arise in
engineering and which cannot be solved in polynomial time by any known
techniques unless they are run on a computer with an unlimited degree of
parallelism. These problems may not belong to the class P, but clearly belong
to the class NP (for non-deterministic, polynomial) of problems which can be
solved in polynomial time on a "non-deterministic" computer (i.e., one with an
unlimited degree of parallelism). Clearly the class NP inludes the class P, and
one of the great open questions in complexity theory is whether the class NP is
strictly larger.
Among the problems known to be solvable in NP time, but not known to be
solvable in P time, are the traveling salesperson problem, the problem of
deciding tautologies, the knapsack problem, the graph coloring problem, and many
scheduling and minimization problems []. We see that it is not lack of interest
or work which has prevented people from finding solutions in P time for these
problems. It is thus strongly believed that at leastt one of these problems
must not be in the class P, and that therefore the class NP is strictly larger.
Karp[] has shown that there is a subclass of NP called the NP complete
problems which are all in P if any of one of them is. Karp lists twenty-one
problems which are NP complete, including all of the problems mentioned above.
As shown in the last section, the NP complete problems show promise for
cryptographic use. But also as shown there, the currently stated nature of
their difficulty (i.e.,worst case) makes them not directly of use. Further, if
we replace worst case computation time with average or typical computation time
as our complexity measure, the current proof of the equivalences among the NP
complete problems is no longer valid. This suggests several interesting topics
for research.
At this point, we leave the current bounds of computational complexity
theory and turn to the problem of logs mod q, which was introduced in the last
section as a potential one-way function. The only reason that this problem is
currently outside of computational complexity is that no one has successfully
established a lower bound on its computation. As we shall see, if its
complexity could be established to be adequately high, it would yield a provably
secure cryptosystem. Since the calculation of logs mod q also is of importance
in other areas of information theory [] we suggest they deserve further study.
The cipher would be implemented as C=P↑K, mod q; P=C↑D, mod q where
DK=1, mod q-1 (since a↑q=a); and P and C are integers between 1 and q-1
inclusive. For D to exist it is necessary that K be relatively prime to q-1,
but since a significant fraction of integers 1≤K≤q-1 are relatively prime to
q-1, the key space is not greatly reduced. Further since it is relatively easy
to find D from K (or to determine that K is not usable), generation of keys,
enciphering, and deciphering are all simple operations whose complexity grows
like log(q).
Cryptanalysis under a known plaintext attack, however, is equivalent to
finding logs mod q since K=logPC, mod q. Knowledge of a number of randomly
chosen P-C pairs cannot reduce the difficulty of cryptanalysis since equivalent
information can be obtained from a single P-C pair by raising both P and C to
the same, randomly chosen set of powers. If C=P↑k then C'=C↑m and P'=P↑m are
also a P-C pair since C'=(P')↑k.
While we cannot show that a chosen plaintext attack also reduces to the
hopefully difficult problem of finding logs mod q, neither have we found any
methods by which a chosen plaintext attack can benefit. Note the mathematical
simplification which results from considering a known plaintext as opposed to a
statistical attack.
As previously noted, Knuth [] gives an algorithm which requires O(q↑.5)
operations and O(q↑.5) words of memory. For q=2↑1000 these requirements are so
large as to be precluded by any developments in computer technology. This is
because quantum limitations set bounds on the minimum amount of energy necessary
to do even the simplest operation [].
However in studying the problem of logs mod q for cryptographic
purposes, Stephen Pohlig[] of Stanford found a technique which allows logs mod q
to be calculated in time proportional to log(q) provided q-1 has no large prime
factors. For example q=65537=2↑16+1 has 2 as the largest prime factor and thus
takes only 16 iterations of a moderatley simple loop. On the other hand if
q=65543 q-1=2(32771) is the prime factorization of q-1. Pohlig's method then
requires on the order of 2(32771↑.5) log2(32771)↑.5=362 words of memory and
32771↑.5 log2(32771)↑.5=1358 operations. In general, if q=2p+1 where p is also
prime then Pholig's method requires almost as much computation and memory as
Knuth's. For logs mod q to be useful in a provably secure cryptosystem we must
show that adequately large primes of the form q=2p+1 exist, and that for primes
of this form the calculation of logs mod q takes at least q↑.5 (or some other
power of q) operations.